ARMA Models

The ARMA model is a stationary model, which means we will need to assume (and verify) that our data are a realization of a stationary process. Because regression can be used to break down a nonstationary model to a trend, seasonal components, and a residual series, it’s often reasonable to apply stationary models to the residuals from a time series regression.

A time series is called a mixed autoregressive moving average model of order \((p,q)\) if it is stationary and takes the form:

\[ x_t = \phi_1 x_{t-1} + ... + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + ... + \theta_q w_{t-q}\] We can observe that the model includes a white noise component, an AR component, and a moving average component. This equation assumes the series has already been demeaned.

We can rewrite this equation in terms of backshift operators and simplify in order to get a very concise version of the equation.

\[ x_t - \phi_1 x_{t-1} - ... - \phi_p x_{t-p} = w_t + \theta_1 w_{t-1} + ... + \theta_q w_{t-q}\] \[ x_t (1-\phi_1 B - ... - \phi_p B^p) = w_t (1 + \theta_1 B + ... + \theta_q B^q)\] \[ x_t (1-\phi_1 B - ... - \phi_p B^p) = w_t (1 + \theta_1 B + ... + \theta_q B^q)\]

\[ \phi_p(B) x_t = \theta_q(B) w_t\]

Important points about an \(ARMA(p,q)\) model:

If we wanted to accommodate a non-zero mean \(\mu\), then we would add a constant term on the right-hand side: \(\alpha = \mu (1-\phi_1 - ...- \phi_p)\):

\[ x_t = \alpha + w_t +[ \phi_1 x_{t-1} + ... + \phi_p x_{t-p} ] + [\theta_1 w_{t-1} + ... + \theta_q w_{t-q}]\]

The equation for \(\alpha\) can also be rearranged to show that the mean \(\mu\) is stationary:

\[ \mu = \frac{\alpha}{1-\phi_1 - ...- \phi_p}\]

Transforming \(ARMA(p,q)\) Model to an \(MA(\infty)\) or \(AR(\infty)\) Model

If we go back to the ARMA equation in terms of the backshift operator:

\[ \phi_p(B) x_t = \theta_q(B) w_t\]

we see that we can rearrange it to get an MA process (\(x_t\) as a function of \(w_t\)):

\[ x_t = \frac {\theta_q(B)}{ \phi_p(B)} w_t \equiv \psi(B) w_t \] Through expansion of \(\phi_p(B)\) in the denominator as a geometric series, this becomes an \(MA(\infty)\) model.

Likewise, we can rearrange to achieve an \(AR(\infty)\) model (\(w_t\) as a function of \(x_t\)):

\[ w_t = \frac{ \phi_p(B)} {\theta_q(B)} x_t \equiv \pi(B) x_t \]

Derivation of Second Order Properties

Skipped. Terminology in async is completely inconsistent from slide to slide. It seems like it’s mainly used to derive an equation corresponding to the form of the ACF plot.

ACF and PACF plots for AR, MA, and ARMA Models

Reminder about the general properties of the ACF and PACF plots:

Plot AR(p) Model MA(q) Model ARMA (p,q) Model
ACF Tails off Abrupt cutoff after lag q Tails off
PACF Abrup.t cutoff after lag p Tails off Tails off

Because the ACF for \(ARMA(1,1)\) and \(AR(1)\) only differ by a constant, it’s hard to use ACF plots to distinguish between the models. In an \(AR(p)\) model, as \(p\) approaches 1, the series gets more persistent. However, even with a smaller \(p\), the addition of an \(MA(q)\) component will also contribute to persistence.

We may be able to more easily distinguish (in some cases) from a PACF plot. Recall that the PACF plot for a \(AR(p)\) model cuts off abruptly after \(p\) lags; a PACF plot for an \(ARMA(p,q)\) plot generally will take longer to tail off.

In practice, we will estimate several models and use AIC, BIC, or forecasting a test set to decide between them.

Example: British Pound - NZ Dollar Exchange Rate

df <- read.table('pounds_nz.dat', header=TRUE)
bpnz <- ts(df$xrate)
str(bpnz)
##  Time-Series [1:39] from 1 to 39: 2.92 2.94 3.17 3.25 3.35 ...
head(bpnz)
## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
## [1] 2.924 2.942 3.172 3.254 3.348 3.507
tail(bpnz)
## Time Series:
## Start = 34 
## End = 39 
## Frequency = 1 
## [1] 3.050 3.187 3.229 3.192 3.352 3.531
par(mfrow=c(2,2))
plot(bpnz)
hist(bpnz)
acf(bpnz)
pacf(bpnz)

Observations:

  • Series is not stationary, so AR model is not appropriate
  • ACF tails off, so MA model doesn’t seem a likely fit.

BPNZ - MA(5) Model

We’ll try an MA model anyways as a demo:

(ma5 <- arima(bpnz, order=c(0,0,5)))  # MA(5) model
## 
## Call:
## arima(x = bpnz, order = c(0, 0, 5))
## 
## Coefficients:
##         ma1    ma2    ma3    ma4   ma5  intercept
##       1.539  1.334  1.169  0.645  0.27      2.873
## s.e.  0.196  0.296  0.277  0.265  0.17      0.112
## 
## sigma^2 estimated as 0.015:  log likelihood = 24.17,  aic = -34.34
AIC(ma5)
## [1] -34.34
BIC(ma5)
## [1] -22.7

The first 4 coefficients are statistically significant, but we still need to do diagnostics:

ma5r <- resid(ma5)

par(mfrow=c(2,2))
plot(ma5r)
hist(ma5r)
# qqnorm(ma5r)
acf(ma5r)
pacf(ma5r)

Observations:

  • the residuals do not appear to be a white noise series
  • ACF, PACF don’t show significant autocorrelations

We also do the Ljung-Box test for autocorrelation of the residual series. The null hypothesis is of independence (no correlation):

Box.test(ma5r, type="Ljung-Box")  # using default of 1 lag
## 
##  Box-Ljung test
## 
## data:  ma5r
## X-squared = 0.5, df = 1, p-value = 0.5

The high p-value says we cannot reject the null hypothesis of independence.

Now let’s look at in-sample vs. out-of-sample fit:

par(mfrow=c(1,1))
ts.plot(bpnz, fitted(ma5), resid(ma5), lwd=c(1,2,1),
        lty=c('solid','dashed','dashed'), col=c('black','blue','black'),
        main='MA(5) model')

The in-sample fit looks reasonable. Here’s what an \(MA(5)\) model would forecast:

# do a forecast of 6 steps
(fcast <- forecast(ma5, h=6))
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 40          3.507 3.348 3.666 3.263 3.750
## 41          3.396 3.107 3.684 2.954 3.837
## 42          3.239 2.882 3.596 2.693 3.785
## 43          3.051 2.650 3.453 2.438 3.665
## 44          2.921 2.507 3.335 2.288 3.554
## 45          2.873 2.457 3.289 2.237 3.509
plot(fcast)
lines(fitted(ma5), lty='dashed', col='blue')

Let’s try back-testing for out-of-sample performance:

bpnz_bt <- window(bpnz, end=33)  # hold out last 5 observations for testing
ma5_bt <- arima(bpnz_bt, order=c(0,0,5))

fcast_bt <- forecast(ma5_bt, h=12)

par(mfrow=c(1,1))
plot(fcast_bt)
lines(window(bpnz, start=34), lty='solid', col='black')
lines(fitted(ma5_bt), lty='dashed', col='blue')

Observations:

  • the forecast is way off since the actual last 6 observations changed direction
  • the forecast stays flat after 5 periods because the \(MA(5)\) model can’t forecast further out

BPNZ - ARMA(1,1) Model

Next we’ll try an ARMA model for the sake of demo. (The requirement of stationarity is not actually met for this time series.)

(arma11 <- arima(bpnz, order=c(1,0,1)))  # ARMA(1,1) model
## 
## Call:
## arima(x = bpnz, order = c(1, 0, 1))
## 
## Coefficients:
##         ar1    ma1  intercept
##       0.892  0.532      2.960
## s.e.  0.076  0.202      0.244
## 
## sigma^2 estimated as 0.0151:  log likelihood = 25.14,  aic = -42.27
AIC(arma11)
## [1] -42.27
BIC(arma11)
## [1] -35.62
arma11r <- resid(arma11)
par(mfrow=c(2,2))
plot(arma11r)
hist(arma11r)
acf(arma11r)
pacf(arma11r)

Box.test(arma11r, type="Ljung-Box")  # using default of 1 lag
## 
##  Box-Ljung test
## 
## data:  arma11r
## X-squared = 0.014, df = 1, p-value = 0.9

Observations:

  • AIC and BIC are both better than the MA(5) model
  • residual TS plot looks more like white noise
  • Ljung-Box test doesn’t reject H_0 of uncorrelated residual series
  • ACF, PACF doesn’t show significant correlations

This looks like a better model for this data.

Now let’s look at in-sample fit and forecast.

par(mfrow=c(1,1))
ts.plot(bpnz, fitted(arma11), resid(arma11), lwd=c(1,2,1),
        lty=c('solid','dashed','dashed'), col=c('black','blue','black'),
        main='ARMA(1,1) model')

In-sample fit looks reasonable, although generally shifted slightly to the right of actual.

(fcast <- forecast(arma11, h=6))
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 40          3.532 3.375 3.690 3.292 3.773
## 41          3.471 3.197 3.744 3.052 3.889
## 42          3.416 3.077 3.755 2.898 3.934
## 43          3.367 2.984 3.750 2.781 3.952
## 44          3.323 2.908 3.738 2.689 3.957
## 45          3.284 2.846 3.722 2.614 3.954
plot(fcast)
lines(fitted(arma11), lty='dashed', col='blue')

Observations:

  • forecast still trends downwards, but not as fast as the MA(5) model

Now, back-test, holding back 6 observations:

arma11_bt <- arima(bpnz_bt, order=c(1,0,1))
fcast_bt <- forecast(arma11_bt, h=12)

par(mfrow=c(1,1))
plot(fcast_bt)
lines(window(bpnz, start=34), lty='solid', col='black')
lines(fitted(arma11_bt), lty='dashed', col='blue')

Observations:

  • forecast still deviates from actual values
  • the 95% confidence interval of the forecast includes most of the actuals

ARIMA Models

ARIMA models are one way of dealing with nonstationary time series, which may have trends or seasonal effects. Simple “differencing” can off convert a nonstationary time series to a stationary one. Often a first-order differencing is sufficient (but generally we don’t want to go higher than 2nd order).

Note: not all nonstationary time series can be deal with by differencing. Notably, volatility clustering (conditional heteroskedasticity) that’s common in financial times series requires a different kind of model. Those are commonly model with Autoregressive Conditional Heteroskedastic (ARCH) models.

The term “integrated” arises from the fact that a differenced series needs to be aggregated in order to recover the original. The simplest \(I(0)\) process is white noise; the simplest \(I(1)\) process is the random walk (because after first differencing, we have a white noise model).

First Differencing of a Random Walk

For example, a random walk has the following form:

\[ x_t = x_{t-1} + w_t\]

Rearranged, we get a stationary white noise series \(w_t \sim N(0,\sigma_w^2)\):

\[ x_t - x_{t-1} \equiv \nabla x_t = w_t \]

Just a quick refresher as to random walk time series look like. Note that the drift term plays the same role as the slope in deterministic linear trend models.

par(mfrow=c(2,1))

# without drift
x <- w <- rnorm(100)
for (t in 2:length(x)) {
  x[t] <- x[t-1] + w[t]
}
plot(ts(x), main='Random Walk Simulation')

# with drift
x <- w <- rnorm(100)
del <- 0.5
for (t in 2:length(x)) {
  x[t] <- del + x[t-1] + w[t]
}
plot(ts(x), main='Random Walk Simulation w/ Drift (delta=0.5)')

As a reminder, the expectation value grows over time due to the drift:

\[E(x_t)=x_0 + t \delta\]

and the variance grows without bounds:

\[ Var(x_t) = t \sigma^2\]

The autocovariance simplifies to:

\[\gamma_k = t \sigma^2\]

Because autocovariance is a function of time, this model is obviously nonstationary.

First Differencing of a Linear Trend

First differencing can also remove deterministic trends. If we have linear trend of the following form, we can consider either first differencing (which results in an \(MA(1)\) progress) or simply subtracting the trend (and analyzing residuals):

\[ x_t = a + bt + w_t\] \[ \nabla x_t = x_t - x_{t-1} = (a - a) + (bt - b(t-1)) + (w_t - w_{t-1}) = b + w_t - w_{t-1}\] Reducing and rewriting in terms of the lag operator, we have a stationary \(MA(1)\) process:

\[ \nabla x_t = b + \theta_q(B) w_t\] Alternately, we could have subtracted the trend (instead of differencing) to achieve a white noise process:

\[ x_t - (a + bt) = a + bt + w_t - (a + bt) = w_t\] If our original time series showed increasing variance over time, we should try a log transformation (and then differencing, if there’s also a trend we want to try to remove).

ARIMA Terminology

An \(ARIMA(p,d,q)\) model is an \(ARMA(p,q)\) model that’s applied after taking the \(d^{th}\) difference of the original time series \(x_t\). Using the lag operator, this can be expressed as:

\[ \phi_p(B)(1-B)^d x_t = \theta_q(B) w_t\]

Simulate an ARIMA time series and estimate it

Let’s simulate this model:

\[ x_t = 0.5 x_{t-1} + x_{t-1} - 0.5 x_{t-2} + w_t +0.3 w_{t-1}\] Rearranging: \[ x_t - x_{t-1} = 0.5(x_{t-1} - x_{t-2}) + w_t + 0.3 w_{t-1}\] \[ x_t - x_{t-1} - 0.5(x_{t-1} - x_{t-2}) = w_t + 0.3 w_{t-1}\] \[ \nabla x_t - 0.5 \nabla x_{t-1} = w_t - w_{t-1}\] Since we have \(x_t\) and \(x_{t-1}\) terms, we can rewrite in terms of the lag operator. This has the form of an \(ARIMA(1,1,1)\) model:

\[ \nabla x_t (1 - 0.5B) = (1 + 0.3B)w_t\] \[ \nabla x_t = 0.5B \nabla x_t + (1+0.3B)w_t\] \[ \nabla x_t - 0.5 \nabla x_{t-1} = w_t + 0.3 w_{t-1} \] We see that after applying the first differencing operator \(\nabla\), the \(ARIMA(1,1,1)\) model is transformed to an \(ARMA(1,1)\) stationary model with an AR parameter \(\phi_1 = 0.5\) and an MA parameter of \(\theta_q=0.3\). (Be careful of signs!)

Let’s simulate this equation, and then compare plots between the original and the differenced equation:

x <- w <- rnorm(100)
for (t in 3:100) {
  x[t] = 0.5 * x[t-1] + x[t-1] - 0.5 * x[t-2] + w[t] + 0.3 * w[t-1]
}
x_ts <- ts(x)

# alternately, we could have asked arima.sim to run the simulation 
# x_ts <- arima.sim(model=list(order=c(1,1,1), ar=0.5, ma=0.3), n=100)

par(mfrow=c(1,2))
plot(x_ts, main='Original Simulated Time Series')
plot(diff(x_ts), main='Differenced Simulated Time Series')

acf(x_ts, main='Original Simulated Time Series')
acf(diff(x_ts), main='Differenced Simulated Time Series')

pacf(x_ts, main='Original Simulated Time Series')
pacf(diff(x_ts), main='Differenced Simulated Time Series')

Observations:

  • Differenced TS appears more stationary on the TS plot
  • ACF of differenced TS dies off much more quickly; suggests MA component
  • PACF of original and differenced TS die off after 1 lag; maybe AR(1)?

Now let’s estimate the model and plot the original and fitted time series.

(arima111 <- arima(x_ts, order=c(1,1,1)))
## 
## Call:
## arima(x = x_ts, order = c(1, 1, 1))
## 
## Coefficients:
##         ar1    ma1
##       0.538  0.291
## s.e.  0.113  0.122
## 
## sigma^2 estimated as 1.03:  log likelihood = -142.4,  aic = 290.7
(arma11_diffed <- arima(diff(x_ts), order=c(1,0,1)))
## 
## Call:
## arima(x = diff(x_ts), order = c(1, 0, 1))
## 
## Coefficients:
##         ar1    ma1  intercept
##       0.537  0.292      0.122
## s.e.  0.113  0.122      0.281
## 
## sigma^2 estimated as 1.03:  log likelihood = -142.3,  aic = 292.5
par(mfrow=c(1,1))
ts.plot(x_ts, fitted(arima111), lwd=c(1,2),
        lty=c('solid','dashed'), col=c('black','blue'),
        main='ARIMA(1,1,1) model')

We see essentially the same results when we manually difference before fitting an ARMA model. Our estimated coefficients are quite close to the true (supposedly unknown) process that we simulated. Our fitted values from the model are quite close to the originals.

Now let’s do our standard diagnostics with the model’s residuals:

arima111r <- resid(arima111)
par(mfrow=c(2,2))
plot(arima111r, main='Residual TS')
hist(arima111r)
acf(arima111r)
pacf(arima111r)

Box.test(arima111r, type='Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  arima111r
## X-squared = 0.063, df = 1, p-value = 0.8

The plots and Ljung-Box test both fail to reject independence (ie. supporting the residual TS as a realization of a white noise process), indicating that we should be able to forecast with this model.

Let’s attempt forecasting:

(fcast <- forecast(arima111, h=12))
##     Point Forecast   Lo 80 Hi 80   Lo 95 Hi 95
## 101          9.038  7.7365 10.34  7.0475 11.03
## 102          9.638  6.9246 12.35  5.4881 13.79
## 103          9.962  5.9439 13.98  3.8171 16.11
## 104         10.136  4.9519 15.32  2.2078 18.06
## 105         10.229  4.0057 16.45  0.7111 19.75
## 106         10.280  3.1224 17.44 -0.6665 21.23
## 107         10.307  2.3025 18.31 -1.9348 22.55
## 108         10.322  1.5407 19.10 -3.1076 23.75
## 109         10.329  0.8300 19.83 -4.1986 24.86
## 110         10.334  0.1637 20.50 -5.2199 25.89
## 111         10.336 -0.4644 21.14 -6.1818 26.85
## 112         10.337 -1.0595 21.73 -7.0925 27.77
par(mfrow=c(1,1))
plot(fcast)
lines(fitted(arima111), lty='dashed', col='blue')

We observe that while the model fit is good, the forecast is essentially a flat line.

Seasonal ARIMA Models (SARIMA)

These are basically an extension of ARIMA models that add a lag equal to a number of seasons in order to remove seasonal effects. The form of the model is \(ARIMA(p,d,q)(P,D,Q)_m\) where \(p,d,q\) are the nonseasonal lag terms, and \(P,D,Q\) are the seasonal lag terms, where \(m\) is the number of periods per year (e.g. \(m=12\) for monthly seasonality, or \(m=4\) for quarterly seasonality). The \(P,D,Q\) terms basically just represent backshifts of a seasonal period. We use lowercase \(\phi_p,\theta_q\) to represent nonseasonal components, and \(\Phi_P, \Theta_Q\) to represent the seasonal terms:

\[ \Phi_P(B^m) (1-B^m)^D \phi_p(B) (1-B)^d x_t = \Theta_Q (B^m) \theta_q(B) w_t\] These terms correspond to:

\[ [\text{Seasonal AR(P)}][\text{Seasonal Diff}][\text{Non-seasonal AR(p)}][\text{Non-seasonal Diff}] = [\text{Seasonal MA(Q)}][\text{Non-seasonal MA(q)}]\]

For, example a monthly \(ARIMA(1,1,1)(1,1,1)_4\) model looks like:

\[ (1-\Phi_1 B^4) (1-B^4)^1 (1-\phi_1 B) (1-B)x_t = (1+ \Theta_1B^4)(1+\theta_1B)w_t\] ### SARIMA Examples

EXAMPLE 1: \(ARIMA(0,0,0)(0,0,1)_12\)

This model indicates that a monthly value is affected by the same month’s value in the previous year, is:

\[x_t = \alpha x_{t-12} + w_t\]

The characteristic equation for that formula is:

\[ (1 - \alpha B^{12}) = 0\] Rearrange, we get:

\[ B = \frac{1}{\alpha}^{1/12} = \alpha^{-1/12}\] The model is stationary when \(|\alpha^{-1/12}|>1\).

EXAMPLE 2: \(ARIMA(0,1,0)(1,0,0)_12\)

\[ x_t = x_{t-1} + \alpha x_{t-12} - \alpha x_{t-13} + w_t\] We could also write as: \[ \nabla x_t = \alpha \nabla x_{t-12} + w_t\] which helps us understand the intuition that the change at time $ t $ depends on the change at the same time of the previous year. Rearranging and factorizing gives:

\[ \Theta_1(B^{12})(1-B)x_t = (1-\alpha B^{12})(1-B)x_t = w_t\] We see the seasonal AR(1) term, with \(m=12\), and the nonseasonal difference term. This model is nonstationary, since we see that there is a root of \(B=1\).

EXAMPLE 3: Variants of seasonal moving average models

A quarterly, seasonal moving average model \(ARIMA(0,0,0)(0,0,1)_4\) (stationary, without a trend) is:

\[ x_t = (1-\beta B^4)w_t = w_t - \beta w_{t-4}\] If a nonseasonal stochastic trend was also present, a \(ARIMA(0,1,0)(0,0,1)_4\) model (using 1st order difference to remove the trend) could look like:

\[ x_t = x_{t-1} + w_t - \beta w_{t-4}\] If, instead, a seasonal stochastic trend was present, the \(ARIMA(0,0,0)(0,1,1)_4\) model could be used to remove the seasonal stochastic trend:

\[ x_t = x_{t-4} + w_t - \beta w_{t-4}\] ### Worked example: EU Retail Trade Index (Quarterly)

library(fpp) # samples for Rob Hyndman's book
## Loading required package: fma
## Loading required package: expsmooth
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: tseries
## Warning: package 'tseries' was built under R version 3.4.4
data(euretail)
str(euretail)
##  Time-Series [1:64] from 1996 to 2012: 89.1 89.5 89.9 90.1 89.2 ...

Plot the time series:

par(mfrow=c(3,1))
# original time series
plot(euretail)
# take seasonal difference (still not stationary)
plot(diff(euretail, lag=4))
# take nonseasonal and seasonal differences (better)
plot(diff(diff(euretail, lag=4)))

Now let’s try modeling it. Note that if we use sarima(...), then we need a slightly different form to retrieve the residuals. The function comes with a nice, default display.

library(astsa)
## 
## Attaching package: 'astsa'
## The following object is masked from 'package:fpp':
## 
##     oil
## The following objects are masked from 'package:fma':
## 
##     chicken, sales
## The following object is masked from 'package:forecast':
## 
##     gas
eu_arima <- sarima(euretail, 0,1,1,0,1,1,4)
## initial  value -0.629077 
## iter   2 value -0.801933
## iter   3 value -0.814397
## iter   4 value -0.827539
## iter   5 value -0.829110
## iter   6 value -0.829179
## iter   7 value -0.829179
## iter   8 value -0.829179
## iter   8 value -0.829179
## iter   8 value -0.829179
## final  value -0.829179 
## converged
## initial  value -0.830649 
## iter   2 value -0.831162
## iter   3 value -0.831170
## iter   4 value -0.831171
## iter   4 value -0.831171
## iter   4 value -0.831171
## final  value -0.831171 
## converged

eu_arima$ttable  # display estimated parameters
##      Estimate     SE t.value p.value
## ma1    0.2901 0.1118   2.595   0.012
## sma1  -0.6909 0.1197  -5.771   0.000
eu_arima$AICc  # AIC, BIC are also available
## [1] -0.608
par(mfrow=c(2,2))
eu_arimar <- resid(eu_arima$fit)
plot(eu_arimar)
hist(eu_arimar)
acf(eu_arimar)
pacf(eu_arimar)

Since both the ACF and PACF show significant spikes at lag \(k=2\), this suggests we need to include additional nonseasonal terms. (We don’t see signifiant seasonal lags.) Also observe that the Ljung-Box plot (nice!) rejects the null hypthesis of independence of the residuals. So, we need to do more work on a better model. Next, I’ll go straight to the best model that the book found:

#eu_arima <- sarima(euretail, 0,1,3,0,1,1,4)
# eu_arima$ttable  # display estimated parameters
# eu_arima$AICc  # AIC, BIC are also available
# eu_arimar <- resid(eu_arima$fit)

(eu_arima <- arima(euretail, order=c(0,1,3), seasonal=list(order=c(0,1,1), period=4)))
## 
## Call:
## arima(x = euretail, order = c(0, 1, 3), seasonal = list(order = c(0, 1, 1), 
##     period = 4))
## 
## Coefficients:
##         ma1    ma2    ma3    sma1
##       0.263  0.370  0.419  -0.661
## s.e.  0.124  0.126  0.130   0.156
## 
## sigma^2 estimated as 0.145:  log likelihood = -28.7,  aic = 67.4
par(mfrow=c(2,2))
eu_arimar <- resid(eu_arima)
plot(eu_arimar)
hist(eu_arimar)
acf(eu_arimar)
pacf(eu_arimar)

Box.test(eu_arimar, type='Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  eu_arimar
## X-squared = 0.003, df = 1, p-value = 1

NOTE: I used arima instead of sarima because the latter produced an object that I couldn’t pass to forecast.

Observations:

  • Residaul TS, ACF, PACF appear representative of white noise (no significant lags)
  • AICc is better than for the earlier model
  • Ljung-Box test fails to reject the null hypothesis of independence of residuals (good)

Now, let’s try a forecast:

(fcast <- forecast(eu_arima, h=12))
##         Point Forecast Lo 80 Hi 80 Lo 95  Hi 95
## 2012 Q1          95.21 94.72 95.70 94.47  95.96
## 2012 Q2          95.29 94.50 96.07 94.08  96.49
## 2012 Q3          95.38 94.26 96.50 93.67  97.09
## 2012 Q4          95.40 93.90 96.91 93.11  97.70
## 2013 Q1          94.63 92.73 96.53 91.72  97.54
## 2013 Q2          94.64 92.39 96.90 91.20  98.09
## 2013 Q3          94.64 92.06 97.23 90.69  98.60
## 2013 Q4          94.67 91.75 97.58 90.21  99.12
## 2014 Q1          93.89 90.61 97.17 88.87  98.91
## 2014 Q2          93.91 90.28 97.53 88.36  99.46
## 2014 Q3          93.91 89.94 97.88 87.84  99.98
## 2014 Q4          93.93 89.62 98.24 87.34 100.52
par(mfrow=c(1,1))
plot(fcast)
lines(fitted(eu_arima), lty='dashed', col='blue')

Observations:

  • the forecast follows the most recent trend
  • due to large CI, neither increasing nor decreasing trends can be ruled out

Worked example: AUS Electricity Production

First load and take a quick look at the dataset.

cbe <- read.table('cbe.dat', header=TRUE)
head(cbe)
elec <- ts(cbe$elec, start=1958, freq=12)
str(elec)
##  Time-Series [1:396] from 1958 to 1991: 1497 1463 1648 1595 1777 1824 1994 1835 1787 1699 ...
head(elec)
##       Jan  Feb  Mar  Apr  May  Jun
## 1958 1497 1463 1648 1595 1777 1824
tail(elec)
##        Jul   Aug   Sep   Oct   Nov   Dec
## 1990 14002 14338 12867 12761 12449 12658
summary(elec)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1463    3239    5891    6312    8820   14338
quantile(elec, c(0.01,0.05,0.1,0.25,0.5,0.75,0.9,0.95,0.99))
##    1%    5%   10%   25%   50%   75%   90%   95%   99% 
##  1597  1848  2108  3239  5891  8820 11332 12192 13844

Now look at the plots.

par(mfrow=c(2,2))
plot(elec)
hist(elec)
acf(elec)
pacf(elec)

The first difference of removes the trend, but not the variance. First difference of log transformed time series removes both:

par(mfrow=c(3,1))
plot(elec)
plot(diff(elec), main='First Difference of TS')
plot(diff(log(elec)), main='First Difference of Log(TS)')

We can see the improvement quantitatively:

summary(elec)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1463    3239    5891    6312    8820   14338
summary(diff(elec))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1471.0  -197.0   -30.0    28.3   258.5  1383.0
summary(diff(log(elec)))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.14533 -0.03832 -0.00675  0.00540  0.05105  0.18836

Now look at the ACF and PACF of the differenced and log-transformed dataset:

elec_trans <- diff(log(elec))
par(mfrow=c(1,2))
acf(elec_trans, lag.max=48)
pacf(elec_trans, lag.max=48)

Both still show seasonal effects, made even more clear by extending the lag axis. I can’t duplicate the results in the async slides, so I’ll instead replicate the best model found in the book. (See Cowper page 144 for an example of a function to determine the best parameters.)

(elec_arima <- arima(log(elec), order=c(0,1,1),
                     seasonal=list(order=c(2,0,2), period=frequency(log(elec)))))
## 
## Call:
## arima(x = log(elec), order = c(0, 1, 1), seasonal = list(order = c(2, 0, 2), 
##     period = frequency(log(elec))))
## 
## Coefficients:
##          ma1   sar1   sar2    sma1    sma2
##       -0.640  0.481  0.514  -0.079  -0.471
## s.e.   0.046  0.238  0.236   0.223   0.142
## 
## sigma^2 estimated as 0.000417:  log likelihood = 956.8,  aic = -1902
elec_arimar <- resid(elec_arima)
par(mfrow=c(2,2))
plot(elec_arimar)
hist(elec_arimar)
acf(elec_arimar)
pacf(elec_arimar)

Box.test(elec_arimar, type='Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  elec_arimar
## X-squared = 0.017, df = 1, p-value = 0.9

Observations:

  • Residaul TS, ACF, PACF appear representative of white noise (no significant lags)
  • Ljung-Box test fails to reject the null hypothesis of independence of residuals (good)

Now, try forecasting:

(fcast <- forecast(elec_arima, h=12))
##          Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1991          9.442 9.416 9.468 9.402 9.482
## Feb 1991          9.412 9.384 9.440 9.369 9.454
## Mar 1991          9.492 9.462 9.521 9.447 9.537
## Apr 1991          9.433 9.402 9.464 9.386 9.480
## May 1991          9.532 9.499 9.564 9.482 9.581
## Jun 1991          9.574 9.540 9.608 9.523 9.625
## Jul 1991          9.610 9.575 9.645 9.557 9.664
## Aug 1991          9.606 9.570 9.642 9.550 9.661
## Sep 1991          9.511 9.474 9.548 9.454 9.568
## Oct 1991          9.511 9.472 9.549 9.452 9.570
## Nov 1991          9.478 9.439 9.518 9.418 9.539
## Dec 1991          9.488 9.448 9.529 9.426 9.551
par(mfrow=c(1,1))
plot(fcast)
lines(fitted(elec_arima), lty='dashed', col='blue')

Observations:

  • the fitted points very closely follow the actual observations
  • finally we see a time series where the CI on the forecast is quite tight!

We need to manually exponentiate fitted and forecasted values if we them on original scale:

# exponentiate the forecasted values
exp_fitted <- exp(fcast$fitted)
exp_fcast <- exp(fcast$mean)
exp_upper95 <- exp(fcast$upper[,'95%'])
exp_lower95 <- exp(fcast$lower[,'95%'])

ts.plot(elec, exp_fcast, exp_fitted, exp_lower95, exp_upper95, lwd=c(1,2,1,1,1),
        lty=c('solid','dashed', 'dashed','solid','solid'), col=c('black','blue','blue','gray','gray'),
        main='Exponentiated Forecast (Original Units')

Review: General Steps in Time Series Analysis

  1. Based on theory, subject matter knowledge, experience - choose a useful class of models
  2. Cleanse the data
  3. Conduct EDA: plot and examine main patterns, atypical observations:
    • Trend
    • Fluctuation around a trend (seasonal or cyclical/non-seasonal)
    • Sharp changes in behavior
    • Outliers
    • See if ACF, PACF plots suggest we should try \(AR(p)\) or \(MA(q)\) models
  4. Examine and statistically test for stationarity
  5. If not stationary (and model requires it), transform it:
    • Detrend
    • Remove seasonality
    • Log transformation (especially if variance increases with time)
    • Take 1-2 differences
  6. Model TS with stationary or integrated TS model
  7. Examine validity of model’s underlying assumptions:
    • Plot residual time series, histogram, ACF, and PACF (try to confirm resids look like white noise)
    • Ljung-Box test of the residual time series (failing to reject H_0 is good)
  8. Among valid models, pick the one with best AIC, AICc, BIC, test set, etc… performance
  9. Conduct inference and/or forecasting (only if model assumptions were validated!)